Identifying User Task Using Associative Classification

 

Deepak Bhalla1* and Devesh Nerayan2

1M. Tech IV, Department of Computer Science and Engineering, Rungta College of Engineering and Technology, Bhilai. Chhattisgarh (India).

2Associate Prof., Department of Computer Science and Engineering, Rungta College of Engineering and Technology, Bhilai. Chhattisgarh (India).

*Corresponding Author E-mail: deepakbhalla.28@gmail.com

 

ABSTRACT:

The explosive use of the web has lead to a massive increase of web sites and boosting the web mining technology. Analyzing and discovering the useful information from the data collected through web is the motive of web mining technology; which also incorporates identification of web user’s task. If we have any procedure to find out user task on the basis of their behavior then we can increase the usability of the web. This paper is providing a new approach to identify the user task at the time of navigation through click stream. Using the web logs we have prepared data comprising different browsing events performed by the user. The prepared data has been applied to the new classification approach known as associative classifier; integrating classification and association rule mining. Our paper presents the performance of the CBA (Classification based on association) algorithm for identification of task performed by the web user. In this paper, we have also presented the number of rules generated with the variant support and confidence.

 

KEYWORDS: Classification, Association Rule Mining, Associative Classifier, Decision Tree, CBA.

 


 

I. INTRODUCTION:

The advent of the World-Wide Web (WWW) [Berners-Lee et all 1994] has overwhelmed home computer users with an enormous flood of information. To almost any topic one can think of, one can find pieces of information that are made available by other internet citizens, ranging from individual users that post an inventory of their record collection, to major companies that do business over the Web.

 

This explosive use of the web has lead to a massive increase of web sites and boosting the web mining technology. Several data mining methods are used to discover the hidden information in the Web [J. Srivastava et all 2000]. The mining algorithms have to be modified such that they better suit the requirements of the web user. New approaches should be used which better fit the characteristics of Web data. Thus, Web mining has been developed into an autonomous research area.

 

The extensive goal of web mining algorithms is to present the information to the user in a clear and concise way, also giving the correct choices to the users, in a very apparent way. The increase of business, web logs and social linking over the web, has extremely raised the importance of web mining tools. Tool assists to analyze the users’ behavior and helps the analyzer to put the most important thing in the right place on a web page or a web application. In all Web Mining focuses on the analysis and prediction of visitors’ usage to improve website performance and/or suggest links based on users’ behavior.

 

The main motivation for our work is to investigate an approach to extract declarative knowledge about Web user browsing and access patterns. The goal for this paper is to find a possible representation to generate first order rules for Web site traffic patterns. Here we used a web event log generated by self developed tool [Anne et All] to find out user access pattern. We divided user task in three major areas Browsing, Fact-finding, and Information Gathering. And we tried CBA to classify user in any one of the category so we can improve the performance of the web.

 

II.  Background Theory:

Knowledge discovery is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [Fayyad et all 1996].

 

Data mining is the application of one or several data mining algorithms in a knowledge discovery process with the purpose of extracting patterns from a set of example data and possibly some background knowledge.

 

With the aim of finding and extracting relevant information that is hidden in the web-related activities; Web Mining can be categorized into three different categories based on which part of the web is to be mined [R. Kosala et all 200]. These three categories are (i) Web content mining, (ii) Web structure mining and (iii) Web usage mining.

 

Web Usage Mining analyses the usage patterns of web sites in order to get an improved understanding of the users’ interests and requirements. This information is especially valuable for E-Business sites in order to achieve improved customer satisfaction. We currently focus on the application of web usage mining for automatically determining Web User Task. This in turn can be supportive to optimize the website, increase web user satisfaction by presenting the reader with more precise alternative of reading a document.

 

Users interact differently with their web browsers with the difference in the cause of their work or in the surfing behavior or in web site structure and content; this in turn categorizes users’ task with different characteristics. Kellar et al. combined the different characteristics of task and presented arrived with different types of task that can be performed by the web user. Accordingly User Task can be modeled in these taxonomies: “Fact Finding”, “Information Gathering”, “Just Browsing”, “Transactions” and “Other”.

 

III. OVEWRVIEW OF CLASSIFCATION:

3.1      Associative Classification

Association pattern discovery and classification rule mining being the essential data mining tasks was integrated to a new approach called associative classification. Association rule mining aims to discover useful and interesting patterns from the data that satisfies some minimum support and confidence. While purpose of the classification is to build a model which will assign a predefined target called class label to its attributes. With the integration of these applications the main objective of the Associative Classification is to determine all the association rules in the data that satisfies the specified minimum support and confidence. These generated association patterns are mapped to the predetermined class so as to build an accurate classifier.

 

3.2   CBA Algorithm

Explaining concepts of CBA Algorithm

IV. WEB DATA:

The foremost step of extracting information about the user’s behavior is to collect the suitable data for the Web Mining. Web data can be the collection of actual linkage between the web pages or the content of the actual web pages or the information about the user and browsing behavior. These different categories of data can be stored in Server Logs functioning as the web data source.

 

4.1   Data Source

An Exploratory Study was conducted by Clemens Cap in which twenty participants were asked to perform exercise on the one version of site. The exercise performed by the user corresponds to the user tasks Fact Finding, Information Gathering, Just Browsing [anne et al]. These tasks were emerged with the following taxonomy [Kellar et al]:

 

Fact Finding:

Information Gathering:

Just Browsing:

 

During the experiment the various activities performed by the user was being collected in the form of log files. The log files encompass list of events occurring in the browser from which behavioral attributes can be extracted.

 

4.2   Data Preparation

Collaborating with the German reach team, we received the event logs which were collected during their experimental study. The event logs were divided in different section at the time of preparation [anne et all]. And as with the change in the user task, the browsing behavior gets changed; we have extracted several behavioral attributes from different sections of those event logs. These behavioral attributes includes the various click stream events. We counted the occurrence of different events in every section and according to that we prepared a data file for our experiment.

 

V.  EXPERIMENT COMPARISIONS:

In this section, we have performed a study on the CBA an associative-classification algorithm. We have also presented a comparative study performed on the associative classification with different support and confidence; resulting in different number of rules generation with different accuracy.

 

5.1   Experiment

The prepared data having the records of various click-stream and mouse events were first converted in the data file format compatible to the CBA Associative Classifier. The output resulted in the form of various CBA rules and frequent patters satisfying the minimum support and confidence as specified. CBA rules identified the various associations among the event attributes and the frequent item sets was also outputted. And at last the result outputted the final classifier.

 

5.2   Experiment Results

The results of our experiment with different support and confidence are shown in the Table I. The experiment resulted with 85.53% of accuracy, with the generation of nine rules when the support of 5 and confidence of 50 was given. The minimum of three rules are generated with the default support of 20 and 50% of confidence resulting the 73.68% of accuracy.

 

Table I. Comparison of Associative Classifier with different Support and Confidence

 

Support / Confidence

Number of Rules

Accuracy

5/10

12

84.21

10/20

4

82.89

20/80

3

73.68

5/50

9

85.53

10/50

4

82.89

 

The performance graph of Accuracy at different support and confidence is exibited in the Graph 1. The resultant graph shows that the acuraccy of the output gets reduced with the increase in support.

 

Graph 1: Comparative Performance of Associative Classifier for Accuracy

 

The graph presented below shows the variation in Number of Rule at different support and confidence is exibited in the Graph 2. The resultant graph shows that the associative classifier generates less number of rules for the applied and with the reduction  of support and confidence the Number Of Rules gets boost.

 

Graph 1: Comparative Performance of Associative Classifier for Number of Rules Generated

VI. CONCLUSION

This paper has attempted to provide an approach in which web event log has been applied to a new classification approach called Associative Classification. We have also presented that how one can trace or separate the user activity through there click stream, this also helps to find out user behavior at the time of navigation. We have shown that the identification of the user task not only depends upon users’ behavior but also in the number of rules generated and the accuracy; which in turn will change the effectiveness of Classifier generated based on Association. Experimental results show the usability of performed task, for advance improvement on web. Our methodology produces useful results because

1.      It offers the new approach to identify user behavior.

2.      It shows the relation between the click stream features.

3.      It may help to identify the working area of user either Fact Finding, Browsing of Information Gathering.

 

VII.      FUTURE SCOPE:

With this paper, we have presented the use of event attributes from the log files, using which we generated rules. Now the next question arises is that; how can one apply the results of the generated rules in increasing the quality of the web site? Can this be automated i.e. with an integrated expert system in the Web server? This may useful to increase the finding patterns for Searching tools and for ranking user activities and web pages.

 

VIII. REFERENCES:

[1]     B. Liu, W. Hsu, and Y. Ma. “Integrating Classification And Association Rule Mining”. In Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 80–86, New York, NY, August 1998.

[2]     Fayyad U.M., Piatetsky-Shapiro G. and Smyth P. From “Data Mining to Knowledge Discovery”. In Fayyad U.M., Piatetsky- Shapiro G., Smyth P. and Uthurusamy R., editor, Advances in Knowledge Discovery and Data Mining, pages 1–34. MIT Press, 1996.

[3]     G. Dong, X. Zhang, L.Wong, and J. Li. “CAEP: Classification by Aggregating Emerging Patterns”. In Proceedings of The Second International Conference on Discovery Science (DS’99), pages 43–55, Japan, December 1999.

[4]     J. Srivastava, R. Cooley, M. Deshpande, P.-N. Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from web data”, SIGKDD Explorations, 1(2), 2000, 12–23

[5]     R. Kosala, H. Blockeel, “Web Mining Research: A Survey”, SIGKDD: SIGKDD explorations: newsletter of the special interest group (SIG) on knowledge discovery & data mining, ACM 2 (1), 2000, 1–15

[6]     T. Berners-Lee, R. Cailliau, A. Loutonen, H. Nielsen, and A. Secret. “The World- Wide Web”. Communications of the ACM, 37(8):76–82, 1994.

[7]     Y. Sun, A. K. C. Wong and Y. Wong “An Overview of Associative classifiers”, Int. Conf. on Data Mining 2006, pp. 138-14.

 

 

Received on 08.06.2011       Accepted on 18.07.2011     

© EnggResearch.net All Right Reserved

Int. J. Tech. 1(2): July-Dec. 2011; Page 62-64